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'^ : ABSTRACT 

<■ 

' We fit a parametric model comprising a mixture of multi-dimensional Gaus- 

^ ■ sian functions to the 3.6 to 8/im colour and optical photometric redshift distribu- 

' tion of galaxy populations in the ELAIS-Nl and Lockman Fields of the Spitzer 

>- ■ Wide-area Infrared Extragalactic Legacy survey (SWIRE). For 16,698 sources 

in ELAIS-Nl we find our data are best modelled (in the sense of the Bayesian 

"^ ' Information Criterion) by the sum of four Gaussian distributions or modes (C^, 

O ; ^b, Cc and Cd). 

p^ ■ We compare the fit of our empirical model with predictions from existing semi- 

■^ ' analytic and phenomological models. We infer that our empirical model provides 

CL|' a better description of the mid-infrared colour distribution of the SWIRE survey 

O ' than these existing models. This colour distribution test is thus a powerful model 

-t— » ' discriminator and is entirely complementary to comparisons of number counts, 

c^ ■ We use our model to provide a galaxy classification scheme and explore the 

^ K* ■ nature of the galaxies in the different modes of the model. Population Ca is found 

^ ' to consist of dusty star-forming systems such as ULIRG's, over a broad redshift 

(73 ■ range. Low redshift late-type spirals are found in population C^, where PAH 

emission dominates at 8/im, making these sources very red in longer wavelength 
IRAC colours. Population Cc consists of dusty starburst systems with high levels 
of star-formation activity at intermediate redshifts. Low redshift early-type spi- 
rals and elhpticals are found to dominate Population Cd- We thus find a greater 
variety of galaxy types than one can with optical photometry alone. 
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Finally we develop a new technique to identify unusual objects, and find a 
selection of outliers with very red IRAC colours. These objects are not detected 
in the optical, but have very strong detections in the mid-infrared. These sources 
are modelled as dust-enshrouded, strongly obscured AGN, where the high mid- 
infrared emission may either be attributed to dust heated by the AGN or sub- 
stantial star-formation. These sources have Zph ~ 2 — 4, making them incredibly 
infrared luminous, with a Lij. ~ 1O^^'^~^^'^L0. 

Subject headings: classification — infrared: galaxies — galaxies: evolution — 
methods: statistical 



1. Introduction 

Wide-field survey astronomy is revolutionising astrophysical research in particular the 
study of galaxy evolution. Surveys such as SDSS (York et al. 2000), 2dF (Colless 1999), 
IRAS (Neugebauer et al. 1984), 2MASS (Kleinmann et al. 1994), FIRST (Becker et al. 
1994), NVSS (Condon et al. 1998) now provide us with a detailed multi-wavelength picture 
of millions of galaxies in the local Universe. The Spitzer Wide-area Infrared Extragalactic 
Legacy survey (SWIRE - Lonsdale et al. 2003, 2004) now provides us with similarly detailed 
sample of galaxies at 2; ~ 1. These huge and complex data sets require us to apply a variety 
of new techniques to extract the vast wealth of information they contain. 

In this paper we explore a parametric technique for studying the colour distribution 
of the SWIRE galaxies, providing: a compact description of the data; a method for source 
classification; and a recipe for the identification of outliers. 

The statistical properties of galaxies in blank field surveys can be used to understand 
galaxy evolution in a number of ways. The most basic approach is to explore the surface 
density of sources as a function of their fiux in a single band and compare this with models. 
Such "number count" analyses have provided clear evidence for galaxy evolution since early 
radio surveys (Rowan- Robinson 1968). Spitzer number counts have already revealed strong 
evolution in far- infrared bands (e.g. Papovich et al. 2004, Dole et al. 2004), and from mid- 
infrared bands (e.g. Fazio et al. 2004). The SWIRE number counts will be discussed by 
Shupe et al. (2006) and Surace et al. (2006). Spectroscopic or photometric redshifts permit 
a more direct understanding of the properties of galaxies through, the study of luminosity 
functions (e.g. SWIRE luminosity functions - Babbedge et al. 2005, Onyett et al. 2005). 
Analysing the observed-frame colour distribution of galaxies is a natural extension of the 
number count analysis and provides an alternative to photometric redshifts for exploiting 
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the colour information in extra-galactic surveys. Until recently, both models and observations 
have not been sophisticated enough to exploit such colour-based techniques. With the mid- 
to-far infrared wavelength coverage of Spitzer (Werner et al. 2004) in seven bands and at 
sensitivities not previously encountered in any infrared surveys, we have now reached a level 
of maturity where such a colour-based analysis is possible. 

New surveys also bring new challenges in the classification of sources. Methods include 
X^ minimisation to fit the spectral energy distribution (SED) of each source with a library 
of galaxy templates, e.g. Bolzonella et al. (2000), Farrah et al. (2003), Rowan- Robinson 
et al. (2003). Such template fitting provides both spectral type and photometric redshift 
estimates, but success depends upon the pre-defined templates being representative of the 
galaxies under consideration, limiting our confidence as we explore new regions of parameter 
space. More recent techniques based on density estimation in colour space (Connolly et al. 
2000) have been particularly successful in identifying sub-samples of sources, such as high 
redshift QSO's in the Sloan Digital Sky Survey (SDSS - York et al. 2000). 

In this paper, we apply an efficient and robust technique to model the colour distribu- 
tion of sources from the Spitzer Wide-area InfraRed Extragalactic legacy survey (SWIRE - 
Lonsdale et al. 2003, 2004). We compare this parametric description of the data with exist- 
ing phenomenological and physical galaxy population models. We use our model to classify 
galaxies and interpret these classifications by comparison with traditional galaxy templates. 

We use results from Spitzer (3.6/im to 24/im) and optical U,(/, r', i', Z imaging of the 
SWIRE fields of ELAIS-Nl and Lockman. We model the four-dimensional (3 colour, optical 
photometric redshift) distribution of galaxies with a mixture of multi-variate Gaussians using 
an Expectation Maximization (EM) Algorithm (Nichol et al. 2000; Connolly et al. 2000). 
Every source is then classified as belonging to one of these Gaussian "distributions" or 
"modes" . The advantage of this classification technique over traditional template fitting is 
that a direct application of pre-determined galaxy template libraries is not required, allowing 
the identification, classification and characterisation of both existing and new object types. 

A colour based analysis of the source characterisation of SWIRE populations using 
galaxy template libraries will be discussed in more detail by PoUetta et al. (2006b, in prep.), 
while an analysis of the spectral energy distributions and photometric redshifts of SWIRE 
sources is given by Rowan- Robinson et al. (2005). 

In §2, we describe the data sets used in our analysis. §3 gives a detailed description 
of the classification technique we use to model the colour distribution of SWIRE sources. 
In §4, we analyse the properties of sources in each mode. We extend this analysis in §5, 
using optical/infrared template colours and star formation rate/stellar mass indicators to 
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determine the type of sources in each distribution. In §6, we investigate how well our 
empirical model can be used to describe simulated data from theoretical models. In §7, we 
employ a method based on our classifications to identify unusual sources in the fields of 
ELAIS-Nl and Lockman. Discussion and conclusions are presented in §8. 



2. SWIRE 

The Spitzer Wide-area InfraRed Extragalactic survey (SWIRE), the largest Spitzer 
Legacy program, is a wide-area imaging survey, mapping the distribution of spheroids, disks, 
starbursts and active galactic nuclei (AGN) and their evolution from ^~3 to the current 
epoch. The survey covers ~49 square degrees (in 6 high galactic latitude fields) in all seven 
Spitzer bands: 3.6, 4.5, 5.8, and 8/um with IRAC (Fazio et al. 2004) and 24, 70 and 160/um 
with MIPS (Rieke et al. 2004), detecting ~2.5 million galaxies down to /3.6^m~ 3.7/iJy. 

The large area of SWIRE is important to establish statistically significant population 
samples over enough cells that we can resolve the star formation history as a function of 
epoch and environment, i.e. in the context of structure formation. The large volume is also 
essential for finding rare objects and transitory phenomena. 

In this paper, we investigate two of the largest SWIRE fields; ELAIS-Nl and Lockman, 
covering a total of ~18 square degrees. 



2.1. ELAIS-Nl and Lockman 

The SWIRE ELAIS-Nl field is centred at IG'^OO'" +59^^01"^, with coverage of ~6.5 
sq.deg. IRAC (3.6/im, 4.5^m, 5.8;um and 8/im)+MIPS (24/im) observations were performed 
in 2004 January and 2004 Feburary. 

The average 5a depths of the ELAIS-Nl sample are 5.0, 9.0, 43, 40 and 311/iJy at 3.6, 
4.5, 5.8, 8, and 24/im respectively (Surace et al. 2004), consistent with the 90% completeness 
levels for source extraction. For ELAIS-Nl, optical U, g', r', i', Z data (complete to r'~23.5) 
were taken between 1999-2003 using the Isaac Newton Telescope (INT) Wide-Field Survey 
(WFS; McMahon et al. 2001, Gonzalez-Solares et al. 2005). The 5a limiting optical depths 
in Vega are U=23.4, ^=24.9, /=24.0, «'=23.2 and Z=21.9. 

The SWIRE Lockman field is centred at 10^45™ +58^^00"", with coverage of ~11.5 
sq.deg. IRAC (3.6/im, 4.5/im, 5.8/im and 8/im)+MIPS (24/im) observations for Lockman 
were performed on 2004 April and 2004 May. The average 5a depths for the Lockman field 
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are 5.0, 9.0, 43, 40 and 311/iJy at 3.6, 4.5, 5.8, 8, and 24^m respectively (Surace et al. 2004), 
consistent with the 90% completeness levels for source extraction. For Lockman, optical U, 
g', r', i' data (to r'~25) was taken between 2001 May and 2004 March using the MOSAIC 
camera on the Mayall 4m telescope at Kitt Peak National Observatory (Siana et al. 2005, 
in prep.). The 5a limiting optical depths in Vega are U=25.0, (/=25.7, ^=25.0, i'=24.0. 

For both fields, fluxes were extracted in 5.8" radius apertures for IRAC (~2-3 times the 
FWHM beam) and 12" for MIPS, using SExtractor (Bertin and Arnouts, 1996). At redshift 
2;<0.2, the median source FWHM was found to correspond to 2.3 - 2.4" in the optical and 
1.5 - 1.8" in IRAC. All SWIRE aperture fluxes were then aperture corrected for wings. 
The absolute flux calibrations are correct within roughly 10% for IRAC and MIPS 24yum 
channel data, and were conflrmed by comparison to 2MASS. Further discussion on the data 
processing is given by Surace et al. (2004, 2005) and Shupe et al. (2006). We use aperture 
fluxes for our entire sample since the light of a galaxy is measured consistently through the 
same aperture in all bands, thus allowing an unbiased comparison of the galaxy colours in 
our sample. 

The optical data sets for ELAIS-Nl and Lockman were processed with the Cambridge 
Astronomical Survey Unit's reduction pipeline (Sabbey et al. 2001). Full analyses of the 
photometric data in these flelds are reported by Babbedge (2004), Surace et al. (2004) and 
Siana et al. (2006, in prep.). 

The ELAIS-Nl fleld contains 411,015 SWIRE sources. 254,693 of these sources have 
optical associations in at least one optical band, are detected with a SNR of at least 5 in one 
or more IRAC bands, and their 24yum associations have a SNR of at least 3. The Lockman 
fleld contains 681,587 SWIRE sources, of which 255,908 sources have optical associations 
with the same SNR criteria as that described for ELAIS-Nl. A search radius of 1.5" was 
used to bandmerge the optical/SWIRE data for both flelds. Stars have been removed from 
these samples using a star-galaxy separation criterion - see Surace et al. (2004). 

We apply an additional constraint to these datasets, and only consider sources with 5a 
detections in all four of the IRAC bands (i.e. 3.6//m, 4.5/im, 5.8/im and 8^m). Applying 
this criterion reduces our sample to 29,675 sources for ELAIS-Nl and 34,712 sources for 
Lockman. 



2.2. SWIRE photometric redshift catalogues 

For both the ELAIS-Nl and Lockman bandmerged datasets, sources with optical asso- 
ciations have been analysed with a template-fltting photometric redshift code. This is ImpZ 
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(Babbedge et al. 2004), based on the code of Rowan- Robinson et al. (2003) and further up- 
dated based on studies in Rowan- Robinson et al. (2005), which uses optical and near-infrared 
data to 4.5/im to fit six optical galaxy templates and two AGN templates. The criteria for a 
source being assigned a photometric redshift is at least 4 detections in optical (in particular 
r'-band)+infrared (3.6/xm, 4.5^m) bands, and a reduced x^ < 10. 

Therefore, our final sample consists of 13,865 bandmerged optical/infrared sources in 
ELAIS-Nl and 8749 bandmerged optical/infrared sources in Lockman, assigned photometric 
redshifts and with detections in all four IRAC bands. 



3. Modelling the density function in A^-dimensional space 

A galaxy can be described by a number of parameters such as fiux, colour and redshift, 
and these data can be represented as points (or vectors) in an A^-dimensional parameter 
space. Parametric modelling of the distribution function for these points can dramatically 
reduce the data volume and simplify comparison with physical or phenomenological models. 
Identifying structures in the distribution function can help us to understand the different 
galaxy populations. 



3.1. General Method 

We assume that the A^-dimensional density function of galaxies in the sample is com- 
posed of a mixture of multi-variate Gaussian functions. Each one of these Gaussian "distri- 
butions or modes" may represent a "population" of galaxies with distinct properties. 

We use the code from Connolly et al. (2000) to fit n data points in iV-dimensional space 
with m Gaussian distributions. This code uses an Expectation Maximization Algorithm 
(Connolly et al. 2000) for parameter estimation, and a Bayesian Information Criterion (BIG - 
see e.g. Nichol et al. 2000) for model selection, i.e. to decide how many Gaussian distributions 
are statistically justified by the data. For each distribution i, the code will output mean co- 
ordinates fii and an NxN covariance matrix ^^ (see Appendix). 

We develop this technique to provide a classification scheme for galaxies. The sum of 
all m Gaussian modes is a Probability Density Function, i.e. it describes the probability 
that a galaxy selected at random will have a given set of data values x, and integrates to 1. 
Each Gaussian mode thus has a non unit normalisation which encapsulates the probability 
that a galaxy drawn at random from the population as a whole came from that particular 
population. We will use the term Probability Density Function {PDF - see Appendix) to 
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describe each Gaussian mode. These PDFs allow us to determine the relative probability 
that any galaxy is drawn from any particular distribution. 

There are different ways to then classify the galaxies. We could relate each galaxy 
to the mode which gives that galaxy the highest probability density. Considering each 
galaxy treated as an isolated case, this technique provides the optimal (maximum likelihood) 
classification. However, since the distribution functions overlap this will not provide the best 
classification for the population statistics. Therefore, we choose an alternative in which we 
assign each galaxy randomly to one of the modes with probability proportional to the PDF 
value at the galaxy's position. In practice we found the choice of technique made very little 
difference. 

In the following section, we discuss how this technique was applied to our data set. 



3.2. Specifics to the SWIRE ELAIS Nl data 

We consider 13,865 sources from the ELAIS-Nl bandmerged optical/infrared catalogue 
for which photometric redshifts have been assigned using ImpZ (Babbedge et al., 2004). 
These sources have detections in all four IRAC bands. We use these IRAC bands to pro- 
duce the following three infrared colour variables; log{fs/ U.s), logl/s.s/As), log(/4.5//3.6), 
and include photometric redshift (zph) as a fourth variable. Since optical data has been 
used to determine photometric redshifts for our bandmerged catalogues, the use of photo-z 
as a variable in our analysis can be considered as a non-linear mapping of optical colours, 
where the photo-z encodes some of the useful information from the optical which we cannot 
get from infrared colours alone. This also means we gain some information on rest-frame 
properties as well as the observed-frame projection. Future work will include the use of the 
optical colours individually (Davoodi et al. 2006b). In addition, the code we are using auto- 
matically standardises our variables to zero mean and unit variance before fitting Gaussian 
distributions, which removes any dependency on the choice of units (however, the outputs 
we quote have been renormalised to natural units). 

High quality photometry measurements have been taken for over 2.5 million galaxies in 
the SWIRE Survey - see Surace et al. (2004). However, the effects of cosmic rays and/or 
artifacts in imaging maps can influence photometry measurements for a very small fraction 
of sources. This could therefore lead to some galaxies having extreme colours which would 
distort any Gaussian distributions fitted to our data. Therefore, we first identify such spu- 
rious sources algorithmically, by running the Expectation Maximization Algorithm to fit a 
single Gaussian distribution to the entire data set. We identify 1% of sources with the lowest 



PDF values (139 sources) from our sample. These source all lie far from the main galaxy 
locus in colour space, which we also identify by eye using n-D visualisation software XGobi*. 

Analysing the postage stamps of these sources reveals that all have errors in their pho- 
tometry. We therefore eliminate all 139 spurious sources from further analysis, reducing our 
sample down to 13,726 galaxies. 



4. A parametric description of the SWIRE multi-colour data 

4.1. Three IRAC Colours plus Photometric Redshift 

Fitting Gaussian distributions to the 13,726 ELAIS-Nl sources in 3-colour-plus-redshift 
space, we find that the data is best described by four, four-dimensional Gaussians. 

To test the robustness of our fits to the ELAIS-Nl data, we also fit to sources in the 
Lockman field (8749 sources). This analysis gives us a measure of many of the systematic 
errors in the parameters of our model. We find the distribution of the data in the Lockman 
field is best described by the same four Gaussian distributions that fit the ELAIS-Nl data 
set. The mean redshift and colours of our four distributions for ELAIS-Nl are given in Table 
1. We have also assigned errors to these mean values, based on the variation between the two 
SWIRE data sets. The covariance matrix for ELAIS-Nl and errors based on the covariance 
matrix are given in Table 6 (see Appendix). Table 1 also gives an estimate of the expected 
number of ELAIS-Nl sources A'^cxp in each of the four modes. A'^obs is the actual number 
of ELAIS-Nl sources according to our PDF classification technique. We find the numbers 
determined using our technique are well within Icr of the expected numbers. 

Figures la-c show IRAC colour-colour projections of sources assigned to each of the four 
distributions (labelled C^, Cb, Cc, Cd). Figure Id shows the redshift distribution of Ca, Cb, 
Cc and Cd- We also illustrate the optical colours of our four distributions (Figures 2a and 
2b), although these variables were not used in the fits. 

Of the four populations in 3-colour-plus-redshift space, population Ca (magenta) is 
found to have the most extreme optical and infrared colours. This population has a mean 
redshift of (2;p/ia)=1.28, although its redshift distribution is found to be relatively broad, 
spanning the redshift range of our entire sample. Galaxies in this population have very blue 
optical colours, with {mu-nig) < 0.0, {mg-rrir) < 1.0 and {rrir-mi) < 1.0. In comparison, 
these galaxies are found to have very red IRAC colours, with log(/8.o//5.8)) log(/5.8//4.5) and 



* http://www. research, att. com/areas/stat/xgobi/ 
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Fig. 1. — Colour and redshift distributions for our sample of SWIRE galaxies, classified from 
mixtures modelling in 3-IRAC colour plus redshift space. Four distributions have been identi- 
fied; Ca - magenta, Cb - orange, Cc - green, Cd - blue, (a): log(/4.5//3.6) against log(/8.o//5.8), 
(b) log(/5.8//4.5) against log(/8.o//5.8) and (c) log(/4.5//3.6) against log(/5.8//4.5) (d) Photometric 
redshift histogram of the four distributions. 
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Fig. 2. — Optical colour-colour projections of SWIRE galaxies classified in 3-IRAC colour plus 
redshift space; Ca - magenta, Cb - orange, Cc - green, Ca - blue, (a): {mr-mi) against {mu-mg) 
and (b) {mg-m^) against {mjj-mg.) 

Table 1 . Mean co-ordinates of the four distributions in 3-colour-plus-redshift space for 
ELAIS-Nl. Errors are estimated from field-to- field variations. The covariance matrix of each 

distribution is given in Table 6 - see Appendix. 



d N,,p ± la 



N. 



obs 



log 

{{fs/m 



log 



log 

iiU/h)) 



(zph) 



Ca 1420 ± 38 1402 0.13 ± 0.01 0.12 ± 0.02 0.07 ± 0.02 1.28 ± 0.09 

Cb 2506 ± 50 2496 0.66 ± 0.06 -0.10 ± 0.02 -0.10 ± 0.03 0.17 ± 0.03 

Cc 3465 ± 59 3515 -0.05 ± 0.01 -0.02 ± 0.02 -0.16 ± 0.01 0.73 ± 0.11 

Cd 6335 ± 80 6313 0.17 ± 0.05 -0.14 ± 0.01 -0.09 ± 0.01 0.32 ± 0.04 

TVoxp ± IfT is an estimate of the expected number of galaxies that represent each of our four distributions 
Ca, Cb, Cc and Cd- Nohs is the actual number of galaxies assigned to each of the four distributions. 
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log(/4.5//3.6) > 0.0. 

Population Cb (orange) contain sources at low redshift (Figure Id), with mean redshift 
(zphb) = 0.17 (Table 1). In the optical, this population is found to have red {niu-mg) and 
{mr-mi) colours. In the infrared, the colours of Cb are found to be blue in the shorter IRAC 
bands (log(/4.5//3.6) < 0.0), and then become redder at longer wavelengths log(/8.o//5.8) > 
0.4. 

Population Cc (green) contains sources at intermediate redshift {{zphc) =0.73). This 
population has {mg-mr) and {mj.-mi) > 0.8, indicating that these galaxies have red optical 
colours. In comparison, this same population is found to be relatively blue in log(/4.5//3.6) 
and log(/8.o//5.8) colour. 

Populations Cd (blue) also contains sources at low redshift, with (zphd) = 0.32. In the 
optical, this population has (mjj-mg) > 0.0, similar to population Cb, although redder {nig- 
rrir) colour. In addition, an obvious bi-modality can be seen in {mu-nig) colour (Figures 2a 
and 2b), where this population is separated into two smaller populations at (mt/-mg)~0.7. 
At shorter IRAC wavelengths, Cd has blue IRAC colours with log(/4.5//3.6) and log(/5. 87/4.5) 
< 0.0. As we move to the longer IRAC wavelengths, we find that Cd has a broad range of 
log(/8.o//5.8) colour. 



4.2. Classification of sources not assigned photometric redshifts 

3014 sources from the ELAIS-Nl data set (with detections in all four IRAC bands) 
could not be assigned photometric redshifts using the template fitting photometric redshift 
code, ImpZ. This is either due to a source having insufficient detections in the optical bands 
used for template fitting, or that no galaxy template provides a good fit to the SED. 

We now classify sources with no photometric redshifts using these same four distri- 
butions, by marginalising our four-dimensional Gaussian distributions over redshift. The 
advantage of this technique over re-classifying using modes identified in pure 3-colour space 
is that we use the same classes for sources with and without photometric redshift and use 
these classes to give us some idea about the redshift distribution of the sources without 
photometric redshifts. 
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Fig. 3. — log(/4.5//3.6) against log(/8.o//5.8) colour-colour diagram of 16,740 galaxies. 13,726 of 
these sources have been assigned photometric redshifts, 3014 sources have not been assigned red- 
shifts. Marginalising our four Gaussian distributions in 3-colour-plus-redshift space allows sources 
not assigned photometric redshifts to be classified from 3-colour data; Ca - magenta, Cb - orange, 
Cc - green, C^ - blue. 

Table 2. Number of sources assigned to each of the four marginalised distributions in 3-colour 

space 



C, N,{%) 



N2{%) 



N,{%) 



Ca 1402 (10%) 1703 (12%) 2087 (69%) 

Cb 2496 (18%) 2844 (20%) 100 ( 4%) 

Cc 3515 (26%) 3671 (27%) 337 (11%) 

Cd 6313 (46%) 5508 (41%) 490 (16%) 

Ni is the number of galaxies in each of the four distributions Ca, Cb, Cc and Cd', 

{i=l) in 3-colour-plus redshift space, («=2) by marginalising over redshift, but for galaxies with redshift 

information, (j=3) by marginalising over redshift, but for galaxies without redshift information. 
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We integrate the PDF of each distribution {P4d{x)) over a redshift interval, defined by 
the mean redshift {<Zph>) and standard deviation (a^) of each distribution: 

/ + 00 i'<Zph> + 5<^z 

-P4D (ai) c?X4 ~ / Pioix) dxi 

We first re-classify the 13,726 sources with photometric redshifts (discussed in §4.1) 
using the marginalised distributions, and compare with the original classification. We then 
classify the 3014 sources not assigned photometric redshifts using the same marginalised 
distributions. 

Figure 3 is a colour-colour projection of the classification of 16,740 (13,726+3014) 
sources. Table 2 shows the number of objects assigned to each of the four marginalised 
distributions, and also a comparison of the two classification schemes. We find reasonable 
agreement between the two sets of classifications with the numbers in each class changing 
by less than 5%. 

We then classify sources without redshifts using the marginalised distributions N3 (see 
Table 2) and find that ~70% of sources are assigned to population C^. This population has 
a broad redshift range, and contains sources with very red IRAC colours. These sources have 
not been assigned photometric redshifts due to optical detections fainter than our detection 
limits. This would mean the optical-|-3.6/xm+4.5^m galaxy templates of Rowan-Robinson 
et al. (2005) could not be used to model the SED of such sources and therefore determine 
reliable photometric redshifts. These sources will be discussed further in §7. We find that 
20% of sources not assigned photometric redshifts have been classified to the two low redshift 
populations Cj, and Cd- We therefore expect these sources to have redshifts of no more than 
0.6. The remaining ~11% of our sample have been assigned to population Cc where sources 
were found to have very blue IRAC colours, and Zph = 0.3 - 1.2. 



5. Galaxy templates and SFR/Stellar Mass Indicators 

In order to establish the types of galaxy populating each distribution, we compare with 
galaxy templates. First, we compare the colours of galaxies classified with our scheme, with 
the SWIRE galaxy template library, which has full optical and infrared coverage. Then we 
use optical and infrared bands outside the wavelength range of the IRAC data used in the 
classification {U, r' and 24/im) together with 3.6/im data to better understand the properties 
of our four populations. 

The SWIRE optical/infrared template library was compiled especially for the SWIRE 
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samples and includes the following spectral types; Ellipticals (three Ellipticals of age 2, 5 
and 13 Gyr), Spirals (Sa, Sb, Sc, Sd), Starbursts (M82 and Arp220), QSO, Seyferts, ULIRGs 
and obscured AGN. 

Figure 4 illustrates these galaxy templates and our classifications in IRAC colour space. 
Interpreting optical U-band and infrared 24/ini as indicators of star formation rate (SFR), 
and optical r'-band and infrared 3.6/im as indicators of stellar mass, we also illustrate the 
four populations in {mu-m^Q) against (rnri-m2A) colour space (Figure 5). We find using 
this combination of the four bands gives the best separation of the four populations in 
colour-colour space, since each colour is made up of an optical and infrared band with a 
large wavelength separation between them. Therefore, each colour is essentially comparing 
the optical and infrared SED's of each population, which will provide more information in 
colour-colour space than using an optical {mu-m.fi) and an infrared {ra^^-m2i) colour. 

Galaxies in population Ca have already been found to occupy a distinct region of IRAC 
colour space by virtue of their strong, red continua (Lacy et al. 2004). Their blue (mcj-ms.e) 
colour and red {mj.i-m2i) colour suggests they are dusty systems with high star formation 
rates such as ULIRGS (Farrah et al. 2001), i.e. Lir>lO^^L0. The template tracks (Figure 4) 
indicate Ca is dominated by AGN and dusty star-forming galaxies. 

Modelling of mid-IR SED's based on ISO spectra (Sajina et al. 2005) suggests galaxies 
in population C^ are dominated by PAH emission in the IRAC bands, particularly at 8//m. 
This would explain why population C^ has very red log(/8.o//5.8) colours and is separated 
from the other three distributions in IRAC colour space. The {mu-Ta^,^) and {mr'-m24) 
colours suggest that galaxies in population Cb are star-forming systems. Galaxy templates 
indicate that this population is dominated by low redshift late-type spirals. 

The {mjj-m^Q) colour of population Cc is found to be redder than the AGN/dusty star- 
forming systems in population Ca- The {mr'-m2A) colour of Cc is found to be similar to 
that of population Ca, an indication that galaxies in Cc are also dusty systems. Galaxy 
templates suggest that this population contains spiral galaxies and dusty starburst systems 
at intermediate redshifts, where Lij.>Lopt- 

Cd consists by two sub-populations. Sources with (mr.'-m24) > 2.5 are found between 
populations Ch and Cc- These sources have bluer log(/8.o//5.8) colour and redder {mr'-m24) 
colour than the late-type spirals in Cb, and redder log(/8.o//5.8) colour and bluer {mr'-m24) 
colour than dusty systems in Cc- They lie in similar regions of IRAC colour space as Cb, 
but their PAH emission is less dominant than Cb at 8/im. Therefore, galaxies in C^ with 
{mr'-m24) > 2.5 are low redshift early-type spirals. Sources in Cd with {mr'-m24) < 2.5 
are low redshift ellipticals. These systems have large stellar masses due to their old stellar 
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Fig. 4. — log(/4,5//3.6) against log(/8.o//5.8) colour-colour diagram of the four galaxy populations 
(Cq - magenta, Cb - orange, Cc - green, Cd) - blue) with the SWIRE galaxy template colours 
overplotted. Solid curves correspond to z<0.6, dashed curves to 2=0.6-1.2 and dotted curves 
z>1.2. 
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Fig. 5. — {rnij-ms.e) against {mri-'m2i) SFR/Stellar mass colour-colour indicators for each of the 
four populations; Ca - magenta, Ch - orange, Cc - green, Cd - blue. 
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populations, and their lack of dust content leads to low levels of infrared emission. The two 
galaxy classes found within this single mode are separated into two modes in optical studies 
(e.g. Baldry et al 2004). Since our distributions were identified using IRAC colours which 
at low redshift detect old stars where ellipticals and early-type spirals are similar, the low-z 
elliptical and spiral populations are found in within this single mode. 



6. Comparing the SWIRE distribution functions to theoretical models 

Our parametric description of the SWIRE colour distribution functions contains a total 
of 60 parameters - (4 modes, each containing 15 parameters) - see Appendix, Table 6. 
However, this massively reduced data set still has enormous discriminatory power. 

To demonstrate this we fit the distribution functions in "marginalized" 3-colour-plus- 
redshift space (our empirical model) to predictions from one semi-analytic and one phe- 
nomological model. These a priori theoretical models were expected to provide a reason- 
able description of the SWIRE observational data. We use the distribution functions in 
marginalized 3-colour-plus-redshift space rather than those determined in actual 3-colour- 
plus-redshift space since the criteria for determining the redshift distributions of the three 
models presented here will be different, and this may introduce a bias in their comparisons. 

First we must determine whether our distribution functions are a good description of 
the real data, our previous analysis has merely argued that these are the best Gaussian 
mixture description of the data. We thus bin our multi-colour data into a multi-dimensional 
histogram and apply a x^ test. We choose a bin width of 0.05 in log colour space and 

Table 3. Comparison of the classification of SWIRE, Xu and GallCS sources using our four 
observed distribution functions. Nmodel is the number of sources from each model assigned to 

each distribution function. 



Ci A'swiRE A^Xu AcallCS 



Ca 3790 (23%) 4944 (42%) 106 (1%) 

Cb 2944 (17%) 1218 (11%) 3167 (22%) 

Cc 4008 (24%) 378 (3%) 1424 (10%) 

Cd 5998 (36%) 5186 (44%) 9720 (67%) 
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Fig. 6. — log(/4.5 7/3.6 ) against log(/8.o//5.8) colour-colour distributions of (a) SWIRE, (b) Xu and 
(c) GallCS sources; Ca - magenta, Ch - orange, Cc - green, C^ - blue. 
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select only those cells that our model predicts will contain two or more galaxies. Taking 

into account the 60 free parameters of the model, we determine a reduced x^ ~ 2.7 - a 

surprisingly good fit given that there is no reason for the underlying distribution of each 
population to be exactly Gaussian. 

We compare our description of the data with two simulated data sets, a 5 square degree 
SWIRE mock catalogue from Xu et al. (2001, 2003) and a mock catalogue from GallCS 
(Hatton et al. 2003), made up of five, 1 square degree observing cones. 

The SWIRE mock catalogue of Xu is based on a set of "backward" galaxy evolution 
models for nearby infrared bright galaxies, together with a library of SED's of 837 IRAS 25/im 
selected galaxies. By attaching an appropriate SED from this library to each source predicted 
by a given evolution model, a Monte Carlo algorithm enables simultaneous comparisons of 
these sources, in a wide range of wavebands. The mock catalogue from GallCS is based 
on a hybrid model for hierarchical galaxy formation studies, using the outputs of large 
cosmological A^-body simulations combined with a semi-analytic model. Galaxies in both 
catalogues have detections at 3.6/im, 4.5//m, 5.8/im and 8//m above the SWIRE 5cr fiux 
limits (§2.1). 

Since the SWIRE galaxies used to determine our empirical model have errors associated 
with their colours, we add random gaussian errors to the colours of sources from Xu and 
GallCS mock catalogues. We do this to avoid any bias in the comparison of our empirical 
model with that of the theoretical models. The synthetic data consists of 11,726 sources 
from the mock catalogue of Xu and 14,417 sources from GallCS. Figures 6a, b and c show 
IRAC colour-colour projections of SWIRE, Xu and GallCS sources. It is immediately clear 
from these plots that the mock catalogues do not describe the data well. 

To obtain more insight into this we classify the sources in the mock catalogues by 
assigning each synthetic source to one of the Gaussian's as we did with the real data in 
Section 4.1. The results are tabulated in Table 3. We also demonstrate how simple it will 
be to compare predictions from future models with our empirical description of the SWIRE 
data, repeating the x^ analysis to quantify how well our Gaussian mixtures "model" fits the 
synthetic "data". 

The Xu model was intended to match the monochromatic number counts, and provides 
a good description of the IRAC number counts. However, based on a finite set of ISO 
templates for the mid-infrared part of SED's, this model was not designed to match the 
exact colour distributions of the IRAC bands. Nevertheless it is interesting to explore how 
it fails in describing the colour distribution. 

Comparing SWIRE (Figure 6a) with Xu (Figure 6b), we find the fit corresponds to a 
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reduced x^~10^^, which is mainly due to the absence of Xu sources in the log(/8.o//5.8) = 
0.2 - 0.6 colour region of population Cd- The majority of sources in this population are 
either concentrated at log(/8.o//5.8)> 0.6 where very low redshift spiral galaxies dominate, 
or log(/8.o//5.8) < 0.2 where we find elliptical galaxies. The Xu model is found to severely 
underpredict redshifted spiral galaxies (with z=0.2 - 0.5) in the colour region log(/8.o//5.8) 
= 0.2 - 0.6. We also find the model of Xu underpredicts spiral and starburst galaxies with 
log(/8.o//5.8) > -0.1 in population C^. 

The GallCS model is a more physically motivated model and does not rely on a fixed 
set of templates. However, it preceded Spitzer and has not been tuned to those number 
counts. We find SWIRE sources redder than log(/4.5//3.6) = have been assigned to class 
Ca (magenta), making up 23% of the entire sample. These sources consist of AGN and 
dusty systems such as ULIRGS over a broad redshift range (see Section 5). However, there 
are very few GallCS sources in the same region of colour space (Figure 6c), and population 
Ca only makes up 1% of the GallCS sample. The lack of GallCS AGN could therefore 
account for the high value of reduced x^ (~10*^). The GallCS model does not include any 
accretion physics which would explain the discrepancy. As a test, we simulate population Ca 
to determine whether the absence of these sources from the GallCS model accounts for the 
bad fit. We find the GallCS model still provides a poor description of our observational data, 
with a reduced x^ ^ 51. However, simulating this population does show that the absence of 
population Ca is the main reason for the inital high value of reduced x^. As with the model 
of Xu, the GallCS model also underpredicts spiral and starburst galaxies in population Cc, 
with log(/8.o//5.8) > -0.1. However, the GallCS model is found to over-predict galaxies in 
population Cb, with log(/4.5//3.6) > and log(/8.o//5.8) > 0.4. Sources assigned to this mode 
and with these IRAC colours are not found to exist in the observational data set. 

This clearly illustrates a catastrophic failure of the a priori models to match our descrip- 
tion of the data. This also demonstrates the need for a complete model of all extragalactic 
phenomena when trying to compare models with observational data (e.g. the addition of an 
AGN component to GallCS). 



7. Outliers 

Intrinsically rare galaxies and those passing through transient phases are important 
for investigating the extreme limits of galaxy formation and understanding the complete 
evolutionary behaviour of galaxies. The search for these sources was a major motivation of 
the SWIRE survey and a driver for the wide area and hence large volume. These sources 
are likely to have unusual colours and so appear as "outliers" in multi-colour space. 
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Fig. 7. — (a) Likelihood curve illustrating the number of SWIRE sources (solid) as a function of 
PDF{j'OTAL)i and the expected number of sources (dashed line) based on simulated data modelled 
by Gaussians. (b) Reliability i? as a function of PDF^rpoTAL)- Sources with values of PDFfj^Qrp^i;^ 
below the cut (red) are not modelled well by our Gaussian distributions. These sources exhibit 
unusual colours compared to the majority of sources in our sample, and so make up our sub-sample 
of candidate outliers. 



Having modelled the distribution function of galaxies in ELAIS-Nl (§4), we use this 
model to obtain a selection of candidate outliers in the two SWIRE fields of ELAIS-Nl and 
Lockman. Our technique for identifying outliers is based on determining the probability 
density in each of the four distributions (PDFi), which we sum to give a PDF^total) for 
every object. 

To identify a sub-sample of outliers, we first need to define an optimal cut in PDF(total)- 
We do this by comparing the PDF(^total) of our observed data with that of a simulated 
data set modelled by our four Gaussians. The simulated data set gives the expected number 
of sources N^^^^PDF) below PDF(^total) (Figure 7a - dashed line), and our SWIRE data 
set will give the observed number of sources Nobs (Figure 7a - solid line). Our optimal cut 
in PDFiTOTAL) will occur in the tail of the resulting likelihood distributions. 
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We define a reliability, 



Tj -'^obs -''cxp 



iVobs ' 

being the ratio of unexpected objects to the total number. 

Figure 7b illustrates /? as a function of PDF^total)- R has a maximum of 96% at 
PDF(^TOTAL) = 0.022. We therefore apply a cut PDF(total) < 0.022 to our sample and 
identify 242 candidate outliers in ELAIS-Nl and 225 in Lockman. 

Since some of these candidate outliers might have spurious detections in some bands, 
we use the IPAC-Skyview^ software to remeasure their photometry in each of the four IRAC 
bands, and compare with their catalogue photometry. This allows sources which have spu- 
rious detections (and therefore spurious colours) to be eliminated from our candidate list. 
We eliminate 172 sources from ELAIS-Nl and 79 sources from Lockman. These sources 
either have bad detections in at least one of the IRAC bands, or are near very bright sources 
causing a bias in their detections. We have thus identified 70 genuine outliers in ELAIS-Nl 
and 146 outliers in Lockman. 



7.1. Candidate outliers 

Here we discuss a small sub-sample of 34 candidates found in Lockman, as examples of 
the types of outliers found in different regions of our colour-redshift space. Properties of the 
remaining 182 outliers from ELAIS-Nl and Lockman can be found online-^. 

Figure 8 shows their location in IRAC colour-colour space. Figure 9 (1-34) shows the 
SEDs of each of these selected candidates. Tables 4 and 5 show the various properties of each 
of these outliers, having modelled the full SED of each outlier with SWIRE optical/infrared 
galaxy templates. These outliers are, by construction, the most unusual objects found in our 
sample. Therefore, our fits may not be that good as we may not have the right templates in 
our libraries. However, some galaxies may have reasonable fits because a standard template 
produces unusual colours at very specific redshifts. Future SWIRE papers will present more 
detailed modelhng of unusual objects (Lonsdale et al. 2006, in prep.). 



t http://www. ipac. caltech. edu/Skyview/ 

^http://www. astronomy. Sussex. ac.uk/^payam/outliers/index. html 
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Fig. 8. — log(/4.5//3.6) against log(/8.o//5.8) colour-colour projection of 216 outliers (cyan). Num- 
bered sources (34 outliers) are representative of the different spectral types in each region of colour 
space; red - Spirals, green - Ellipticals, orange - Seyferts, maroon - Starburst/QSO, magenta - 
Spiral/Starburst, brown - QSO, black - Mrk231, purple - ULIRG, red - Obscured AGN 
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Outliers 1, 2, 3, 4 - Outliers identified in this region of colour space have log(/8.o//5.8)>0.5, 
log(/4.5//3.6)<-0.1 and are most likely to be star- forming galaxies. Their SED's are all well 
represented by a spiral template at Zph=0.05, with low infrared to optical flux ratios. Sources 
identified as outliers in this region of colour space have very sharp PAH features at 8/im, 
which gives them peculiar colours at this specific redshift. 

Outliers 5, 6, 7, 8 - These sources represent outliers with log(/4.5//3.6)<-0.1 and 
log(/8.o//5.8)<0.2. The template colours in §5 suggest ellipticals dominate this region of 
colour space. Fitting templates to the SED's of outliers in this region also suggests that 
these are elhpticals. Outliers 5 and 8 which are very blue in log(/4.5//3.6) are found to have 
low infrared to optical fiux ratios and have Zph^0.05. In comparison, outliers 6 and 7 are less 
blue in log(/4,5//3.6), yet much bluer in log(/8.o//5.8)) and have higher redshifts of Zph^O.2 - 
0.5. These sources are not intrinsically unusual objects but are outliers due to their redshift. 

Outliers 9, 10 - Outliers 9 and 10 are found to have log(/4.5//3.6)<-0.3, and are 
identified as outliers due to PAH features giving odd colours at specific redshifts. With high 
infrared to optical fiux ratios, such sources are likely to be dusty systems. The SED's of 
these outliers are found to resemble that of a Seyfert 2 galaxy, with intermediate redshifts 
of Zph-^0.8. 

Outlier 11 - This source is found to have log(/5.8//4.5)>0.5, log(/8.o//5.8)>0.1, and low 
infrared to optical fiux ratios. With a high detection at 24//m, the best fitting template to 
the SED of this object is an M82 Starburst at Zphr^0.07. 

Outlier 12 - This outlier is not found to have extreme infrared colours, but is of 
particularly high redshift for sources found in this region of colour space. With low infrared 
to optical fiux ratios, the SED of this outlier resembles that of a Type 1 QSO at a redshift 

of Zphr^2.2. 

Outliers 13, 14 - These source are found to have log(/8.o//5.8)<-0.3, and represent 
a cluster of outliers in this region of colour space. Outlier 13 has high optical and IRAC 
emission, but is not detected at 24//m. The SED of this outlier resembles that of a highly 
luminous spiral galaxy at 2;p/j~2.4. Outlier 14 is detected at 24//m, and has high infrared to 
optical fiux ratios. Therefore, its SED is modelled as NGC6090 Starburst at Zph^lA. 

Outliers 15, 16, 22, 24 - These sources are found to have log(/4,5//3.6)>0.35. The 
infrared SED's of all four outliers are very similar, yet outliers 15 and 16 are blank in the 
optical. Outliers 22 and 24 have very faint g' , r' and i'-band detections, with magnitudes 
in the range 23 - 24.4. All these outliers are expect to have high levels of dust obscuration. 
Modelled as Type 1 QSO's with high infrared to optical fiux ratios, they have a redshift 
range of Zpft~1.8 - 2.4. 
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Outliers 17, 21 - Located in similar regions of colour space as outliers 15, 16, 22 and 
24, these two sources are modelled as Mrk231 (dust-dominated AGN). Both outliers have 
high 24;um detections (hence, high levels of star-formation) characteristic of Mrk231, and 17 
is blank in the optical. At Zph^2, these sources would have Ljr~lO^^-lO^^*^L0, putting them 
in the same catagory as Hyperluminous Infrared Galaxies (HLIRG's - Farrah et al. 2004). 
These types of sources are among a rare population of galaxies recently discovered by Spitzer 
(Houck et al. 2005, Yan et al. 2005, Lonsdale et al. 2006, in prep). 

Outliers 18, 19, 20, 23, 25, 26, 27 - Sources with IRAC colours log(/4.5//3.6)>0.1 
and log(/8.o//5.8)>0.2. These types of sources are outliers because they have PAH features 
which throw out the infrared colours at particular redshifts. With strong detections in the 
mid-infrared, particularly at 24/im, we expect sources in these regions of colour space to 
be very dusty star- forming systems, such as ULIRG's. Outliers 25, 26 and 27 are all blank 
in the optical, an indication of dust obscuration since these same systems have relatively 
low redshifts in the range Zph^0.17 - 0.22. Outliers 18, 19, 20 and 23 are the only ULIRG 
modelled sources with optical detections. Outliers 18 and 19 have redder IRAC colours than 
outliers 20 and 23 and are modelled at Zph^O.18. In comparison, 20 and 23 have much higher 
redshifts of Zphr^l.2 - 1.8. 

Outliers 28, 29, 30, 31, 32, 33 and 34 - A selection of outliers we have identified 
with very red infrared SED's and no optical detections. These sources have IRAC colours in 
the range log(/8.o//5.8) = 0.2 - 0.6 and log(/4.5//3 g) = 0.1 - 0.5, making them amongst the 
reddest objects we have in our sample. They are found to have faint near-infrared detections, 
in comparison to their very high mid-infrared 24^m emission, ranging from 0.7 - 3mJy. The 
spectral energy distribution of these sources are best modelled by dust-enshrouded strongly 
obscured AGN, where the high mid-infrared emission could either be due to dust heated by 
the AGN, or substantial star- format ion. At redshifts in the range Zph^ 2-4, their infrared 
bolometric luminosities are found to be Lj^^IO^^'^^^^'^Lq. If star-formation is responsible 
for the high infrared emission in these systems, this range of infrared bolometric luminosity 
would correspond to infrared star-formation rates of 1.5x10^ - 5.2xlO^M0/yr~^. Figure 10 
shows the optical and infrared postage stamps of outlier 28, and illustrates how these sources 
are heavily obscured in the optical, but have very high emission in the mid-infrared bands. 
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7.2. The Number Density of Outliers modelled as dust-enshrouded strongly 

obscured AGN 

We determine the integrated number density - total of l/Knaxj of outliers we have 
modelled as strongly obscured AGN (outliers 28 - 34 in §7.1). From our outlier sample of 
216 galaxies from Lockman and ELAIS-Nl, we have identified a total of 12 outliers with 
SED's corresponding to that of obscured AGN. Since our sub-sample has Zph^2 - 4, we 
determine the number density of these galaxies within this redshift interval. 

V^ax is the volume corresponding to the maximum redshift at which a source could be 
detected by the survey. We set this maximum redshift by considering a fiux-limited sample 
based on the mid-infrared 24/xm limit {24^m.iim = 311^Jy). All 12 outliers are found to have 
24/im detections that far exceed this limit. 

We therefore find that these obscured systems have a number density of ~10~^'^ h^ 
Mpc~^ in the high redshift universe, corresponding to ~0.1% of the number density of sub- 
millimetre galaxy populations (SMG's) identified by SCUBA at z~2.5 (Chary & Elbaz 2001, 
Chapman et al. 2005). 



Table 4. Photometry of 34 outliers. Columns 5-8 show optical U, (/,/ and i mag, and 9-13 show IRAC (3.6^m, 

4.5/Um, 5.8/Um, S/xm) + MIPS 24/xm in ^uJy. 



Outlier 


Name 


RA[J2000] 


DEC[J2000] 


U g' 


r' 


i' 


3.6/im 


4.5/^m 


5.8/im 


8/im 


24^m 






SWIRE- J 


(h:m:s) 


(d:m:s) 


(AB) (AB) 


(AB) 


(AB) 


(MJy) 


(/^Jy) 


(MJy) 


(MJy) 


(MJy) 




1 


104752.65+572844.6 


10:47:52.65 


+57:28:44.6 


21.22 


20.18 


19.59 


73.4+1.0 


53.6+1.1 


44.6+4.8 


267.1+5.6 


239.2+34.6 




2 


105918.35+575042.1 


10:59:18.35 


+57:50:42.1 


20.51 


19.88 


19.26 


56.5+0.9 


36.2+1.3 


34.1+4.7 


165.1+6.5 


423.4+33.4 




3 


104447.50+575343.8 


10:44:47.50 


+57:53:43.8 


20.92 


20.16 


19.67 


40.1+0.8 


25.3+1.0 


49.1+5.1 


182.5+5.4 






4 


105635.36+583244.8 


10:56:35.36 


+58:32:44.8 


20.56 


19.50 


18.94 


98.5+1.0 


64.4+1.2 


79.1+4.4 


373.4+5.7 


513.7+38.4 




5 


104514.97+580852.4 


10:45:14.97 


+58:08:52.4 


19.59 


18.28 


17.59 


630.8+4.2 


323.9+4.2 


63.9+5.3 


54.3+5.2 






6 


104755.39+584814.2 


10:47:55.39 


+58:48:14.2 


23.94 22.46 


20.44 


19.51 


112.8+0.9 


78.4+1.0 


68.8+3.6 


35.7+3.8 






7 


104628.32+585228.7 


10:46:28.32 


+58:52:28.7 


23.29 22.02 


20.08 


19.32 


103.4+0.9 


81.2+1.1 


44.6+2.9 


30.2+3.6 






8 


105210.33+582027.5 


10:52:10.33 


+58:20:27.5 


19.89 


18.35 


17.65 


174.0+1.6 


95.6+1.6 


33.6+4.9 


49.5+5.5 






9 


103913.43+594112.1 


10:39:13.43 


+59:41:12.1 




23.13 


22.05 


231.3+2.3 


97.6+2.3 


128.7+7.1 


194.6+9.4 


891.6+20.6 




10 


104057.84+581808.2 


10:40:57.84 


+58:18:08.2 






20.66 


175.5+1.5 


83.4+1.3 


74.7+4.5 


97.2+5.3 


633.5+17.5 




11 


104055.13+580206.6 


10:40:55.13 


+58:02:06.6 


21.60 


20.66 


20.45 


12.0+0.7 


12.5+0.9 


43.1+4.9 


56.7+5.3 


1489.8+15.7 




12 


104807.13+584224.4 


10:48:07.13 


+58:42:24.4 


20.58 21.10 


20.51 


19.97 


62.7+0.8 


67.3+0.9 


80.3+3.9 


106.9+3.9 


742.4+15.7 




13 


104330.68+584928.9 


10:43:30.68 


+58:49:28.9 


23.99 24.04 


23.32 


22.74 


35.4+0.8 


42.2+1.0 


46.1+4.2 


19.8+3.9 




to 


14 


105008.02+574801.5 


10:50:08.02 


+57:48:01.5 


24.51 




22.76 


77.1+0.9 


82.7+1.0 


77.0+5.1 


33.9+5.1 


295.0+19.5 


^ 


15 


105928.00+572640.4 


10:59:28.00 


+57:26:40.4 








9.8+0.5 


22.5+0.8 


40.3+3.8 


62.7+4.5 


238.1+15.7 




16 


105834.93+574725.3 


10:58:34.93 


+57:47:25.3 








27.0+0.9 


67.9+1.3 


128.9+6.3 


181.6+6.0 


516.2+17.3 




17 


104659.42+584624.0 


10:46:59.42 


+58:46:24.0 








21.4+0.7 


51.2+1.1 


87.8+4.2 


125.4+4.7 


607.7+14.2 




18 


103944.02+573639.1 


10:39:44.02 


+57:36:39.1 


22.67 


22.05 


21.53 


9.5+0.7 


17.7+1.2 


77.9+6.1 


152.7+6.5 


1152.0+19.9 




19 


105700.41+583313.2 


10:57:00.41 


+58:33:13.2 


24.42 


23.17 


22.80 


4.2+0.5 


6.9+0.9 


15.7+3.6 


62.1+5.1 


409.8+19.5 




20 


105844.93+574145.9 


10:58:44.93 


+57:41:45.9 


23.50 


23.00 


22.53 


38.8+1.0 


60.1+1.4 


30.5+5.7 


50.3+6.3 


648.8+18.2 




21 


103752.16+575048.7 


10:37:52.16 


+57:50:48.7 




24.04 


23.14 


110.9+1.5 


266.4+1.9 


566.2+8.8 


1321.9+7.2 


3727.3+20.5 




22 


103909.10+580946.0 


10:39:09.10 


+58:09:46.0 


24.00 


23.64 


23.06 


12.1+0.8 


29.6+1.3 


50.2+6.2 


119.2+6.6 


396.2+19.3 




23 


104111.62+582123.0 


10:41:11.62 


+58:21:23.0 






23.06 


70.3+1.3 


124.1+1.6 


110.1+6.4 


475.2+6.6 


1292.9+17.1 




24 


104000.20+582458.2 


10:40:00.20 


+58:24:58.2 


24.37 


23.93 


23.46 


16.7+0.9 


43.4+1.4 


82.5+6.3 


163.1+6.7 


999.6+17.7 




25 


105817.58+581916.7 


10:58:17.58 


+58:19:16.7 












16.8+0.7 


26.5+1.0 


84.6+5.1 


235.4+5.6 


974.6+19.7 




26 


103508.84+573737.7 


10:35:08.84 


+57:37:37.7 












11.5+0.8 


18.6+1.0 


43.7+5.4 


208.9+5.7 


746.6+20.6 




27 


104148.12+590322.4 


10:41:48.12 


+59:03:22.4 












9.3+0.7 


22.0+0.9 


41.8+5.2 


108.4+4.8 


393.6+20.3 




28 


104314.94+585606.3 


10:43:14.94 


+58:56:06.3 












8.1+0.6 


22.5+0.9 


63.2+3.6 


119.0+4.4 


965.2+16.3 




29 


104024.03+571944.4 


10:40:24.03 


+57:19:44.4 












10.8+0.7 


25.0+1.2 


45.9+5.8 


165.0+6.5 


723.7+18.0 




30 


105132.41+591355.3 


10:51:32.41 


+59:13:55.3 












22.0+0.9 


27.8+1.0 


85.1+6.3 


277.0+5.6 


2406.3+18.9 




31 


104132.08+581508.5 


10:41:32.08 


+58:15:08.5 












18.8+0.9 


40.6+1.3 


85.6+6.3 


274.5+6.9 


942.8+19.7 




32 


104931.60+554954.4 


10:49:31.59 


+55:49:54.4 












31.9+1.1 


101.0+1.6 


309.5+7.3 


940.4+7.9 


3206.4+18.0 




33 


104337.28+575830.3 


10:43:37.28 


+57:58:30.3 












3.1+0.6 


9.6+0.9 


58.9+4.8 


146.1+5.4 


559.4+18.3 
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Outlier 


Name 
SWIRE- J 


RA[J2000] 
(h:m:s) 


DEC[J2000] 
(d:m:s) 


U g' 
(AB) (AB) 


r' i' 
(AB) (AB) 


3.6/^m 

(MJy) 


4.5/.tm 

(MJy) 


5.8/.tm 

(/^Jy) 


8/im 

(MJy) 


24^111 

(/^Jy) 


34 


103839.02+574533.9 


10:38:39.01 


+57:45:33.8 






28.1+0.9 


79.9+1.1 


330.4+7.7 


526.4+5.3 


1617.5+16.2 



to 
00 
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Table 5. Optical/infrared properties of 34 outliers using SED template fitting. 



Outlier 


2ph 


PDF 


Log(Lopt) 


Log(i/fl) 

Lq 


Best Template Fit 


1 


0.05 


1.60x10""' 


9.96 


10.81 


Spiral - Sc 


2 


0.05 


3.54x10-5 


9.73 


10.58 


Spiral - Sc 


3 


0.05 


4.11x10-'' 


9.59 


10.45 


Spiral - Sc 


4 


0.05 


2.83x10-5 


10.02 


10.87 


Spiral - Sc 


5 


0.05 


2.08x10-6 


11.37 


7.67 


Elliptical 


6 


0.49 


2.40x10-6 


11.86 


8.52 


Elliptical 


7 


0.20 


1.73x10-6 


11.75 


7.24 


Elliptical 


8 


0.05 


8.05x10-3 


10.87 


7.43 


Elliptical 


9 


0.84 


4.02x10-" 


12.47 


13.32 


Seyfert 2 


10 


0.83 


0.01 


12.42 


13.27 


Seyfert 2 


11 


0.07 


4.66x10-4 


9.02 


9.87 


M82 Starburst 


12 


2.25 


2.10x10-5 


12.69 


12.84 


QSO IR (blue qso) 


13 


2.41 


2.94x10-5 


12.31 


12.38 


Spiral - Sd 


14 


1.43 


0.01 


11.97 


11.62 


NGC6090 Starburst 


15 


2.13 


0.02 


11.95 


12.10 


Type 1 QSO 


16 


1.82 


1.38x10-3 


12.13 


12.79 


Type 1 QSO 


17 


2.14 


4.43x10-3 


12.20 


13.34 


Mrk231 


18 


0.17 


3.07x10-5 


9.07 


10.67 


IRAS22491 


19 


0.19 


8.71x10-'^ 


8.76 


10.40 


IRAS22491 


20 


1.82 


7.59x10-'* 


12.04 


13.72 


IRAS22491 


21 


1.82 


0.01 


12.70 


13.05 


Mrk231 


22 


2.06 


7.76x10-3 


12.03 


12.69 


Type 1 QSO 


23 


1.26 


1.53x10-3 


11.82 


13.19 


IRAS19254 


24 


2.37 


1.81x10-3 


12.42 


13.11 


Type 1 QSO 


25 


0.22 


0.02 


9.27 


11.13 


IRAS22491 


26 


0.21 


1.37x10-3 


9.12 


10.92 


IRAS22491 


27 


0.21 


0.01 


9.10 


10.45 


IRAS20551 


28 


3.01 


1.86x10-3 


12.25 


13.10 


Obscured AGN 


29 


2.79 


5.26x10-3 


12.10 


12.95 


Obscured AGN 


30 


4.24 


2.69x10-4 


13.22 


14.07 


Obscured AGN 


31 


1.95 


0.02 


11.69 


12.55 


Obscured AGN 


32 


2.48 


4.33x10-6 


12.62 


13.47 


Obscured AGN 


33 


2.90 


1.34x10-* 


12.09 


12.94 


Obscured AGN 


34 


2.41 


5.16x10-6 


12.30 


13.15 


Obscured AGN 



Column 2 shows the photometric redshift estimates, column 3 gives the sum of the probability density 

values for each outlier, columns 4-5 show optical and infrared luminosity estimates based on template fits, 

and column 6 shows the best-fit templates for the SED's of each outlier. 
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Figure 9 - (1-34) 
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Fig. 9. — Spectral energy distribution (SED) of selected outliers modelled using SWIRE galaxy 
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J? 1£C« -LU. Fig. 10.- Optical (U, g' , r . i ) and infrared (3.6/J.m-70/J.m) images of Outlier 28. Also shown is the best fitting template to the 

SED of this object. NOTE: The orientation of the optical JJ , g , r , i images are offset by ^^45 degrees clockwise from the infrared 3.6/J.m-70fJ.m 
images. 



-36- 



lar 



H ' 



4ipF 



-■ — ■ — ■ ■ ■ ■ ■ IP ■ 1 — I ■ ■ ■ ■ ip ■- 

et'il.-r.t t4<n«Mt*: Q«i«^«d *G«4 



f 



ft r 



4li 



i 



i 



j-*--^ 




I.D IDI.D 



^4f 



1 



ioi>.D 



H ■■ 



4 IP 



f 



ft ■ 



JO 
httt-n bH"#«at*: Qlwti^«ij *fi«l 



H)LI 



ID.D 



rj 




1D&0 



J1 



37 



'sr 



ti ' 



■ ■■■■■■■p ■ iiiiiiip I iiiai ■■! 

Btlt-l-t I«vr«i4tt: QtMCv^*d JkGN 



'^ 

^ 

5 



li? - 



ft r 



i 







*J 



I'l - 






h.D ID'.O 

13 



lOQUD 



i?r 



J I 



BHt-lit l.«vne^t#L OlMC^WJ *(» 



J^ 



4-s?t 



i 



ftr .1 






4lI 



KD 




1[».D 



5 



■ ■ lllllip ■ ■ ■■■■■■I ■ ■ ■■■■■l| 

B#tt-R bfr^iot*: QtJMv'WJ AG«I 



.rj 




1D0LD 



-37- 



8. Discussion and Conclusions 

We have presented a parametric model for describing SWIRE galaxy populations and 
outliers in the fields of ELAIS-Nl and Lockman. 

For 16,698 sources in ELAIS-Nl with detections in four IRAC bands, we use 3 IRAC 
colours and an optically derived photometric redshift and find our data set is best described 
by four Gaussian modes. {Ca, Cb, Cc and C^). We have determined the parameters of these 
modes, providing an empirical description of this sub-set of the SWIRE galaxies and shown 
(with a xl test) that our empirical model of four Gaussian modes provides a good description 
of our SWIRE data set. 

We also find that by using only 3 IRAC colours (i.e. excluding photometric redshift) 
our data set is still best modelled by the same four Gaussian distributions. 

We then find that synthetic data from two theoretical models provide a very poor de- 
scription our empirical model. The model of Xu is found to significantly underpredict the 
population of spiral galaxies with infrared colour 0.2 < log(/8.o//5.8) < 0.6. The GallCS 
model fails to account for AGN and ULIRGS redder than log(/4.5//3.6) > 0. This demon- 
strates that predictions from these theoretical models are clearly inadequate for describing 
the wealth of data in the SWIRE survey. When such models are available, we have illustrated 
how our simple parametric description of the SWIRE colour distribution can be used as a 
powerful model discriminator, entirely complementary to comparisons of number counts. 

We then use optical/infrared template colours and star formation rate/stellar mass 
colour indicators to determine the galaxy types that exist in each of our four distributions. 

Galaxies in population Ca (magneta) are dusty systems with high levels of star- format ion 
activity such as ULIRGS, over a broad redshift range. Population Cb (orange) contains 
low redshift galaxies dominated by PAH emission at 8^m, characteristic of late-type spiral 
systems. Population Cc (green) contain sources at intermediate redshifts. The star-formation 
activity of this population is less than that of the ULIRG population of Ca, but higher than 
that of the late-type spiral systems of population Cb, indicating that Cc is dominated by 
spiral and dusty starburst systems. Population Cd (blue) is dominated by elliptical and 
early-type spiral galaxies at low redshift {{zphd) =0.32), which correspond to the bi-modality 
found using optical colours. Since all our distributions were identified using IRAC colours 
which at low redshift detect old stars where ellipticals and early-type spirals are similar, the 
two galaxy classes are found within this single mode. 

We then devise a new technique for identifying unusual sources in the fields of ELAIS-Nl 
and Lockman. We identify a total of 216 candidate outliers. 
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Analysing a selection of these outliers we find that sources with blue log(/4.5//3.6) colours 
are star-forming/Seyfert 2 systems where sharp PAH features are responsible for their pecu- 
liar IRAC colours at particular redshifts. Outliers with red log(/4.5//3.6) colour are found to 
be dusty star-forming systems such as ULIRGs/Mrk231, where PAH features have thrown 
out the IRAC colours at certain redshifts. 

We also identify a sub-sample of 12 galaxies with very red infrared SED's and no optical 
detections. Best modelled as obscured AGN at redshifts of Zp^r^ 2-4, these sources would 
be very infrared luminous, with Ljr~lO^^'^~^^'^L0, and would correspond to ~0.1% of the 
number density of sub-millimeter galaxies in the high redshift universe. 

We expect such extreme outliers to be dust enshrouded systems with a strongly obscured 
AGN. The high mid-infrared emission may be as a result of dust being heated by the AGN, 
similar to Compton-thick AGN (Polletta et al. 2006a). The hyperluminous gravitationally 
lensed galaxy IRAS F10214+4724 at z=2.29 (Rowan-Robinson et al. 1991 and Lacy et al. 
1998) is known to contain dust-enshrouded AGN, and two ultraluminious high-redshift dusty 
galaxies H 1413+117 (Barvainis et al. 1995) and APM08279+5255 (Irwin et al. 1998) also 
contain powerful AGN that similarly dominate their infrared emission. However, a starburst 
component in these galaxies cannot be completely ruled out. Therefore, an alternative 
interpretation could be that these systems have high mid-infrared emission because they are 
going through their maximal star formation whilst habouring AGN activity (Lonsdale et al. 
2006, in prep.). This present day level of star formation could be related to strong negative 
feedback effects limiting population HI star formation during the earliest galaxy formation 
era (Sokasian et al. 2004). Therefore, we could now be seeing the first episodes of substantial 
star formation occurring in these individual galaxies. The presence of a starburst component 
in these galaxies would correspond to infrared star- format ion rates in the range 1.5x10^ - 
5.2xlO^M0/yr^^. Identifying such galaxies was a key goal of SWIRE. 

The classification technique we have used in this paper is an efficient, automated way 
of identifying sub-samples of galaxies with common photometric properties. An arbitrary 
number of input parameters can be employed, giving the method great flexibility, particularly 
when additional information such as photometric redshifts is available. In addition, this 
technique is capable of classifying sources of similar photometric properties without making 
any prior assumptions about what these objects may be (q.v. SED template fitting). 

Another important feature of the techniques we have used is the way we use multi- 
dimensional data. Classical techniques use 2D projections e.g. identification of AGN in in- 
frared colour space (Lacy et al. 2004), simulation of the mid-infrared Spitzer colours (Sajina 
et al. 2005), and a colour-based classification of SWIRE sources using template libraries 
(Polletta et al. 2006b). While these will be roughly consistent when applied to low dinien- 
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sionality data, the limitations of projections will become more apparent as we move to higher 
dimensionality. Further, we can classify objects using a range of colours in the optical, near, 
mid and far-infrared, include photometric redshifts, stellarity, morphology, and so create an 
"overall classification" based on more than the photometric properties of sources. 

Fully automated techniques such as this and the complementary SED template fitting 
method will be essential for further analysis of the SWIRE fields. 
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APPENDIX 

The A^-diinensional Probability Density Function (PDF) is defined as (see §3.1): 



PDF{x:,fi.,J:i) = A. 



^/i!WW. 



=exp 



-(x-^.)^S. ^(x-^.) 



where: 



A represents the amphtude of each Gaussian 

X represents the co-ordinates of each galaxy 

/ij represents the mean co-ordinate of each Gaussian 

N is the dimensions of the Gaussians 

and the covariance matrix Sj of each distribution («) in A^-dimensional parameter space is 
defined as: 
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where: 



the variance a^'^ of each distribution is defined as (t|^ = ((x^v — I^-n)'^) 
the covariance aNM (Nt^M) is defined as ctnm = {{x^ — fiN){xM — /Um)) 
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Table 6. The amplitude (A), mean (fi), variance ittv^ and covariance ajsfM (N^^M) of each 

distribution in 3-colour-plus-redshift space. Errors assigned to these values are based on the 

variation between the ELAIS-Nl and Lockman data sets. The covariance matrix describing each 

distribution corresponds to galaxies above the SWIRE 5(T flux limits - see §2.1. 



a 



a 



Cr 



Cd 



A 




0.11 ± 0.02 


0.18 ± 0.06 


0.25 ± 0.03 


0.46 ± 0.03 


/^i 




0.13 ± 0.01 


0.66 ± 0.06 


-0.05 ± 0.01 


0.17 ± 0.05 


/X2 




0.12 ± 0.02 


-0.10 ± 0.02 


-0.02 ± 0.02 


-0.14 ± 0.01 


/is 




0.07 ± 0.02 


-0.10 ± 0.03 


-0.16 ± 0.01 


-0.09 ± 0.01 


/i4 




1.28 ± 0.09 


0.17 ± 0.03 


0.73 ± 0.11 


0.32 ± 0.04 


ai^ 




0.018 ± 0.003 


0.033 ± 0.009 


0.022 ± 0.006 


0.064 ± 0.011 


^2^ 




0.017 ± 0.002 


0.013 ± 0.001 


0.011 ± 0.004 


0.012 ± 0.003 
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0.009 ± 0.002 


0.005 ± 0.002 


0.002 ± 0.002 


0.003 ± 0.001 


^4^ 




0.336 ± 0.057 


0.015 ± 0.005 


0.052 ± 0.017 


0.018 ± 0.011 
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0"21 


0.004 ± 0.002 


-0.003 ± 0.001 


-0.001 ± 0.003 


0.004 ± 0.006 


(Ti3 or 


0"31 


0.005 ± 0.002 


-0.002 ± 0.002 


0.003 ± 0.001 


0.009 ± 0.001 


Cri4 or 


0-41 


0.003 ± 0.01 


-0.005 ± 0.007 


0.002 ± 0.003 


0.007 ± 0.007 


(723 or 


0"32 


0.006 ± 0.002 


-0.003 ± 0.0005 


0.0004 ± 0.0007 


0.002 ± 0.001 


0-24 or 


0"42 


0.005 ± 0.012 


-0.004 ± 0.0003 


-0.003 ± 0.008 


0.006 ± 0.004 


0-34 or 
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0.013 ± 0.006 


0.006 ± 0.002 


0.0007 ± 0.002 


0.003 ± 0.002 



l=log(/8//5.8), 2=log(/5.8//4.5), 3=log(/4.5//3.6) and 4=photoinctric redshift 
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